ObservationGenerator#

The obversation_generator module provides classes for generating and managing observations during the beam search process in the LMCSC (Language Model-based Corrector with Semantic Constraints) system.

Key Components#

  • BaseObversationGenerator: Abstract base class for observation generators.

  • NextObversationGenerator: Concrete implementation of an observation generator.

BaseObversationGenerator#

The BaseObversationGenerator class serves as an abstract base class for observation generators. It defines the interface that all observation generators should implement.

Key Methods:#

  • reorder: Reorders the beams based on given indices.

  • step: Performs a step in the beam search process.

  • show_steps: Displays the steps taken in the beam search process.

  • get_observed_sequences: Retrieves the observed sequences from the beam search process.

NextObversationGenerator#

The NextObversationGenerator class is a concrete implementation of BaseObversationGenerator. It records the progress of the beam search, tracking what has been generated so far and what characters are yet to be generated.

Key Features:#

  • Supports both string and byte-level operations

  • Tracks predictions, steps, and completion status for each beam

  • Provides verbose mode for detailed step tracking

  • Handles reordering of beams during search

  • Generates observed sequences based on the current state of the search

API Documentation#

class lmcsc.obversation_generator.BaseObversationGenerator[source]#

Bases: object

reorder(beam_idx: List[int]) None[source]#
step(token_lists: List[List[str | bytes]]) None[source]#
show_steps() None[source]#
get_observed_sequences() List[str][source]#
class lmcsc.obversation_generator.NextObversationGenerator(src, n_beam, n_observed_chars, is_bytes_level, verbose=False)[source]#

Bases: BaseObversationGenerator

This class records the progress of the beam search, tracking what has been generated so far and what characters are yet to be generated.

Parameters:
  • src (List[str]) – The source sequences.

  • n_beam (int) – The number of beams for beam search.

  • n_observed_chars (int) – The number of characters to observe.

  • is_bytes_level (bool) – Whether to operate at the byte level.

  • verbose (bool, optional, defaults to False) – Whether to enable verbose mode.

src#

The source sequences, potentially encoded to bytes.

Type:

List[Union[str, bytes]]

n_beam#

The number of beams.

Type:

int

n_observed_chars#

The number of characters to observe.

Type:

int

is_bytes_level#

Whether operating at byte level.

Type:

bool

verbose#

Verbose mode flag.

Type:

bool

batch_predicts#

Predictions for each beam in each batch.

Type:

List[List[Union[str, bytes]]]

batch_steps#

Steps taken for each beam in each batch.

Type:

List[List[int]]

batch_verbose_steps#

Verbose steps for each beam in each batch.

Type:

List[List[List[Union[str, bytes]]]]

is_finished#

Flags indicating if each beam in each batch is finished.

Type:

List[List[bool]]

reorder(beam_idx: List[int]) None[source]#

Reorders the beams based on the given indices.

Parameters:

beam_idx (List[int]) – The indices to reorder the beams.

step(token_lists: List[List[str | bytes]], step_lists: List[List[int]])[source]#

Performs a step in the beam search process.

Parameters:
  • token_lists (List[List[Union[str, bytes]]]) – The tokens generated in this step.

  • step_lists (List[List[int]]) – The corresponding steps for each token.

show_steps() None[source]#

Displays the steps taken in the beam search process.

get_observed_sequences() List[str][source]#

Retrieves the observed sequences from the beam search process.

Returns:

The observed sequences for each beam in each batch.

Return type:

List[str]